A trimmed mean approach to finding spatial outliers

نویسندگان

  • Tianming Hu
  • Sam Yuan Sung
چکیده

Outlier detection concerns discovering some unusual data whose behavior is exceptional compared to other data. In contrast to non-spatial outliers which only consider non-spatial attributes, spatial outliers are defined to be those sites which are very different from its neighbors defined in terms of spatial attributes, i.e., locations. In this paper, we propose a local trimmed mean approach to evaluating the spatial outlier factor which is the degree that a site is outlying compared to its neighbors. The structure of our approach strictly follows the general spatial data model, which states spatial data consists of trend, dependence and error. We empirically demonstrate trimmed mean is more outlier-resistant than median in estimating sample location and it is employed to estimate spatial trend in our approach. In addition to using the 1st order neighbors in computing error, we also use higher order neighbors to estimate spatial trend. With true outlier factor supposed to be given by the spatial error model, we compare our approach with spatial statistic and scatter plot. Experimental results on two real datasets show our approach is significantly better than scatter plot, and slightly better than spatial statistic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Robust Gaussian Graphical Modeling with the Trimmed Graphical Lasso

Gaussian Graphical Models (GGMs) are popular tools for studying network structures. However, many modern applications such as gene network discovery and social interactions analysis often involve high-dimensional noisy data with outliers or heavier tails than the Gaussian distribution. In this paper, we propose the Trimmed Graphical Lasso for robust estimation of sparse GGMs. Our method guards ...

متن کامل

Robust artificial neural networks and outlier detection. Technical report

Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods h...

متن کامل

Trimmed estimators for robust averaging of event-related potentials.

Averaging (in statistical terms, estimation of the location of data) is one of the most commonly used procedures in neuroscience and the basic procedure for obtaining event-related potentials (ERP). Only the arithmetic mean is routinely used in the current practice of ERP research, though its sensitivity to outliers is well-known. Weighted averaging is sometimes used as a more robust procedure,...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2004